AITopics | linguistic annotation

Collaborating Authors

linguistic annotation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Tamil Language Computing: the Present and the Future

Sarveswaran, Kengatharaiyer

arXiv.org Artificial IntelligenceJul-11-2024

This paper delves into the text processing aspects of Language Computing, which enables computers to understand, interpret, and generate human language. Focusing on tasks such as speech recognition, machine translation, sentiment analysis, text summarization, and language modelling, language computing integrates disciplines including linguistics, computer science, and cognitive psychology to create meaningful human-computer interactions. Recent advancements in deep learning have made computers more accessible and capable of independent learning and adaptation. In examining the landscape of language computing, the paper emphasises foundational work like encoding, where Tamil transitioned from ASCII to Unicode, enhancing digital communication. It discusses the development of computational resources, including raw data, dictionaries, glossaries, annotated data, and computational grammars, necessary for effective language processing. The challenges of linguistic annotation, the creation of treebanks, and the training of large language models are also covered, emphasising the need for high-quality, annotated data and advanced language models. The paper underscores the importance of building practical applications for languages like Tamil to address everyday communication needs, highlighting gaps in current technology. It calls for increased research collaboration, digitization of historical texts, and fostering digital usage to ensure the comprehensive development of Tamil language processing, ultimately enhancing global communication and access to digital services.

application, computer, tamil, (15 more...)

arXiv.org Artificial Intelligence

2407.08618

Country:

Asia > Sri Lanka > Northern Province > Jaffna District > Jaffna (0.05)
South America (0.04)
North America > Central America (0.04)
Asia > India (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Data-Driven Representation for Sign Language Production

Walsh, Harry, Ravanshad, Abolfazl, Rahmani, Mariam, Bowden, Richard

arXiv.org Artificial IntelligenceApr-17-2024

Phonetic representations are used when recording spoken languages, but no equivalent exists for recording signed languages. As a result, linguists have proposed several annotation systems that operate on the gloss or sub-unit level; however, these resources are notably irregular and scarce. Sign Language Production (SLP) aims to automatically translate spoken language sentences into continuous sequences of sign language. However, current state-of-the-art approaches rely on scarce linguistic resources to work. This has limited progress in the field. This paper introduces an innovative solution by transforming the continuous pose generation problem into a discrete sequence generation problem. Thus, overcoming the need for costly annotation. Although, if available, we leverage the additional information to enhance our approach. By applying Vector Quantisation (VQ) to sign language data, we first learn a codebook of short motions that can be combined to create a natural sequence of sign. Where each token in the codebook can be thought of as the lexicon of our representation. Then using a transformer we perform a translation from spoken language text to a sequence of codebook tokens. Each token can be directly mapped to a sequence of poses allowing the translation to be performed by a single network. Furthermore, we present a sign stitching method to effectively join tokens together. We evaluate on the RWTH-PHOENIX-Weather-2014T (PHOENIX14T) and the more challenging Meine DGS Annotated (mDGS) datasets. An extensive evaluation shows our approach outperforms previous methods, increasing the BLEU-1 back translation score by up to 72%.

codebook, dataset, sequence, (12 more...)

arXiv.org Artificial Intelligence

2404.11499

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Surrey > Guildford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Promising Solution (0.68)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Speech (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation

Sánchez-Cartagena, Víctor M., Pérez-Ortiz, Juan Antonio, Sánchez-Martínez, Felipe

arXiv.org Artificial IntelligenceJan-29-2024

This paper studies the effects of word-level linguistic annotations in under-resourced neural machine translation, for which there is incomplete evidence in the literature. The study covers eight language pairs, different training corpus sizes, two architectures, and three types of annotation: dummy tags (with no linguistic information at all), part-of-speech tags, and morpho-syntactic description tags, which consist of part of speech and morphological features. These linguistic annotations are interleaved in the input or output streams as a single tag placed before each word. In order to measure the performance under each scenario, we use automatic evaluation metrics and perform automatic error classification. Our experiments show that, in general, source-language annotations are helpful and morpho-syntactic descriptions outperform part of speech for some language pairs. On the contrary, when words are annotated in the target language, part-of-speech tags systematically outperform morpho-syntactic description tags in terms of automatic evaluation metrics, even though the use of morpho-syntactic description tags improves the grammaticality of the output. We provide a detailed analysis of the reasons behind this result.

computational linguistic, language pair, linguistic annotation, (10 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2020.coling-main.349

2401.16078

Country:

Europe > Denmark > Capital Region > Copenhagen (0.05)
Europe > Italy > Tuscany > Florence (0.04)
Europe > Germany > Berlin (0.04)
(11 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

50 Beginner AI Terms You Should Know Gengo AI

#artificialintelligenceOct-24-2018, 02:27:12 GMT

Entity Extraction: An umbrella term referring to the process of adding structure to data so that a machine can read it. This may be done by humans or by a machine learning model. Forward Chaining: A method in which a machine must work from a problem to find a potential solution. By analyzing a range of hypotheses, the AI must determine which are relevant to the problem. General AI: An AI that could successfully do any intellectual task that any given human being currently can.

artificial intelligence, machine learning, natural language, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.37)

Add feedback

GraphGrail Ai RU

#artificialintelligenceFeb-20-2018, 08:27:50 GMT

Artificial General intelligence (AGI) is a system with the cognitive abilities of a human, capable of performing a wide range of tasks and to apply this knowledge to solving unfamiliar problems without preparation. Artificial neural network (ANN) is a mathematical model, as well as its software or hardware implementation, based on the principle of organization and functioning of biological neural networks -- networks of nervous cells of a living organism. Computational linguistics (also: mathematical or computational linguistics) is a research area in the field of mathematical and computer modeling of intellectual processes in humans and animals for creation of artificial intelligence systems, which aims to use mathematical models to describe natural languages. However, in the latter the emphasis is not on abstract models, but rather on applied methods of describing and processing language for computer systems. The field of activity of computational linguists is developing algorithms and applied programs for processing language information. Linguistic annotation is one of the basic concepts of corpus linguistics.

graphgrail ai ru, machine learning, natural language, (4 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.86)

Add feedback

Complementing Semantic Roles with Temporally Anchored Spatial Knowledge: Crowdsourced Annotations and Experiments

Vempala, Alakananda (University of North Texas) | Blanco, Eduardo (University of North Texas)

AAAI ConferencesApr-19-2016

This paper presents a framework to infer spatial knowledge from semantic role representations. We infer whether entities are or are not located somewhere, and temporally anchor this spatial information. A large crowdsourcing effort on top of OntoNotes shows that these temporally-anchored spatial inferences are ubiquitous and intuitive to humans. Experimental results show that inferences can be performed automatically and semantic features bring significant improvement.

artificial intelligence, computational linguistic, natural language, (18 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country:

Europe (0.68)
North America > United States > Texas (0.46)
North America > United States > Colorado (0.28)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback